# Zero-shot Generalization

Visualclozepipeline 384
Apache-2.0
VisualCloze is a universal image generation framework based on visual context learning, supporting generalization across multiple in-domain tasks and unseen tasks, generating target images and intermediate results in a single step.
Text-to-Image
V
VisualCloze
294
5
Poseless 3B
Apache-2.0
PoseLess is an innovative robotic hand control framework that directly maps 2D images to joint angles using projection representations, eliminating the need for explicit pose estimation.
Multimodal Fusion Transformers
P
homebrewltd
98
7
Poseless 3B
Apache-2.0
Poseless-3B is a vision-language model (VLM)-based robotic hand control framework that directly maps 2D images to joint angles without explicit pose estimation.
Pose Estimation Transformers
P
Menlo
65
10
Colqwen2.5 V0.1
MIT
A visual retrieval model based on Qwen2.5-VL-3B-Instruct and ColBERT strategy, capable of generating multi-vector representations for text and images to enable efficient document retrieval.
Text-to-Image Safetensors English
C
vidore
985
0
Colqwen2 V0.1
Apache-2.0
A visual retrieval model based on Qwen2-VL-2B-Instruct and ColBERT strategy, capable of efficiently indexing documents through visual features
Text-to-Image English
C
vidore
21.25k
170
Sam2 Hiera Large
Apache-2.0
A foundational model for promptable visual segmentation in images and videos developed by FAIR
Image Segmentation
S
facebook
155.85k
68
Openvla 7b
MIT
OpenVLA 7B is an open-source vision-language-action model trained on the Open X-Embodiment dataset, capable of generating robot actions based on language instructions and camera images.
Image-to-Text Transformers English
O
openvla
1.7M
108
Openvla V01 7b
MIT
OpenVLA v0.1 7B is an open-source vision-language-action model trained on the Open X-Embodiment dataset, supporting various robot controls.
Text-to-Image Transformers English
O
openvla
30
10
Biomednlp KRISSBERT PubMed UMLS EL
MIT
KRISSBERT is a knowledge-enhanced self-supervised learning model for biomedical entity linking. It trains contextual encoders using unannotated text and domain knowledge to effectively address the diversity and ambiguity of entity names.
Knowledge Graph Transformers English
B
microsoft
4,643
29
Cxmefzzi
Apache-2.0
A fine-tuned text-to-SQL conversion model based on T5-3B architecture, significantly improving structured query generation accuracy through PICARD constrained decoding
Large Language Model Transformers English
C
tscholak
689
32
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase